3 research outputs found

    Heterogeneous parallel virtual machine: A portable program representation and compiler for performance and energy optimizations on heterogeneous parallel systems

    Get PDF
    Programming heterogeneous parallel systems, such as the SoCs (System-on-Chip) on mobile and edge devices is extremely difficult; the diverse parallel hardware they contain exposes vastly different hardware instruction sets, parallelism models and memory systems. Moreover, a wide range of diverse hardware and software approximation techniques are available for applications targeting heterogeneous SoCs, further exacerbating the programmability challenges. In this thesis, we alleviate the programmability challenges of such systems using flexible compiler intermediate representation solutions, in order to benefit from the performance and superior energy efficiency of heterogeneous systems. First, we develop Heterogeneous Parallel Virtual Machine (HPVM), a parallel program representation for heterogeneous systems, designed to enable functional and performance portability across popular parallel hardware. HPVM is based on a hierarchical dataflow graph with side effects. HPVM successfully supports three important capabilities for programming heterogeneous systems: a compiler intermediate representation (IR), a virtual instruction set (ISA), and a basis for runtime scheduling. We use the HPVM representation to implement an HPVM prototype, defining the HPVM IR as an extension of the Low Level Virtual Machine (LLVM) IR. Our results show comparable performance with optimized OpenCL kernels for the target hardware from a single HPVM representation using translators from HPVM virtual ISA to native code, IR optimizations operating directly on the HPVM representation, and the capability for supporting flexible runtime scheduling schemes from a single HPVM representation. We extend HPVM to ApproxHPVM, introducing hardware-independent approximation metrics in the IR to enable maintaining accuracy information at the IR level and mapping of application-level end-to-end quality metrics to system level "knobs". The approximation metrics quantify the acceptable accuracy loss for individual computations. Application programmers only need to specify high-level, and end-to-end, quality metrics, instead of detailed parameters for individual approximation methods. The ApproxHPVM system then automatically tunes the accuracy requirements of individual computations and maps them to approximate hardware when possible. ApproxHPVM results show significant performance and energy improvements for popular deep learning benchmarks. Finally, we extend to ApproxHPVM to ApproxTuner, a compiler and runtime system for approximation. ApproxTuner extends ApproxHPVM with a wide range of hardware and software approximation techniques. It uses a three step approximation tuning strategy, a combination of development-time, install-time, and dynamic tuning. Our strategy ensures software portability, even though approximations have highly hardware-dependent performance, and enables efficient dynamic approximation tuning despite the expensive offline steps. ApproxTuner results show significant performance and energy improvements across 7 Deep Neural Networks and 3 image processing benchmarks, and ensures that high-level end-to-end quality specifications are satisfied during adaptive approximation tuning

    A GPU implementation of tiled belief propagation on Markov random fields

    Get PDF
    In this work, we present a parallelized version of tiled belief propagation for stereo matching. The proposed algorithm is implemented in CUDA to leverage parallel processing capabilities of GPUs. In our solution, the original tiled BP algorithm is combined with a number of optimizations specific to parallel programs in CUDA. For the given test inputs, the proposed solution runs in 7.96 milliseconds on Nvidia Tesla C2050, achieving acceptable accuracy with respect to the reference code. This work has been published in 2013 Eleventh ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE 2013), winning the MEMOCODE Design Contest 2013 in the adjusted cost-accuracy category. To the best of authors knowledge, this represented the first work in optimizing a parallelized version of the tiled BP algorithm. After presenting our approach, at selecting an appropriate candidate algorithm for parallelization and implementing in on GPU by applying a series of appropriate optimizations, we discuss the current state of the art on stereo matching, that has been presented since publishing this work

    Type inference of polymorphic success typings in Erlang

    No full text
    35 σ.Ο εντοπισμός λαθών σε προγράμματα κατά τη διαδικασία της αναπτυξης καθώς και οι έλεγχοι σε ήδη υπάρχοντα κώδικα συνιστούν σημαντικό μέρος του χρόνου που απαιτείται για την αναπτυξη και την συντήρηση εφαρμογών. Συνεπώς η ανάπτυξη εργαλείων που βοηθούν τον προγραμματιστή στον εντοπισμό λαθών είναι σημαντική για τον περιορισμό του απαιτούμενου χρόνου και την αύξηση της αποτελεσματικότητας των ελέγχων. Αυτή η εργασία γίνεται στο πλαίσιο του Dialyzer, ενός εργαλείου που χρησιμοποιεί στατική ανάλυση για να προσδιορίσει λάθη σε προγράμματα στη γλώσσα Erlang. Η ανίχνευση λαθών βασίζεται στην εξαγωγή τύπων με χρήση τύπων επιτυχίας (success typings), η οποία όμως δεν υποστηρίζει πολυμορφικούς τύπους στα ορίσματα και στους τύπους επιστροφής των συναρτήσεων. Σε αυτή την εργασία επεκτείνονται οι δυνατότητες του Dialyzer με την εισαγωγή πολυμορφικών τύπων με στόχο την ανίχνευση, με μεγαλύτερη ακρίβεια, λαθών σε προγράμματα όπου χρησιμοποιούνται πολυμορφικές δομές δεδομένων.Error correction in programs during the development phase as well as in existing code tends to consume a significant fraction of programmers’ time. Tools that address this problem by automating error detection result in less time consumed during development and testing as well as reduced number of bugs. This thesis is done in the context of the Dialyzer, a static analysis tool that detects programmer errors in Erlang programs such as definite type errors, unreachable code due to unsatisfiable conditions, concurrency errors, etc. To detect type errors, Dialyzer is using type inference of success typings, which albeit is currently restricted to inferring monomorphic types of arguments and return results of functions. This thesis presents the extention of this analysis to add parametricity to these types and thereby be able to possibly catch more errors in programs where polymorphic types such as sets, trees, etc. are used.Μαρία Π. Κοτσιφάκο
    corecore